This jupyter notebook tutorial provides a quick introduction to the basic functionalities of the quasildr package.
To run this notebook, you need some extra python dependencies for single-cell data preprocessing and visualization. If you have not install them. You can install with pip install scanpy plotly.
cd PATH_TO_QUASILDR
import pandas as pd
import scanpy as sc
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from quasildr.graphdr import *
We will load a dataset containing 18,213 cells from developing mouse dentate gyrus from Hochgerner et al. 2018.
dataanno = pd.read_csv('./example/Dentate_Gyrus.anno.gz',sep='\t',index_col=0)
dataanno = dataanno.loc[:,'ClusterName']
data = pd.read_csv('./example/Dentate_Gyrus.spliced_data.gz',sep='\t',index_col=0)
adata = sc.AnnData(data.values.T, data.columns.values,data.index.values)
adata.var_names_make_unique()
sc.pp.log1p(adata)
sc.pp.scale(adata)
sc.tl.pca(adata)
We will first visualize the first 3 PCs. For this dataset, the different cell types are mixed in the PCA visualization.
pca50 = adata.obsm['X_pca']
pca50 = pca50/pca50[:,0].std()
traces = [go.Scatter3d(
x = np.asarray(pca50[dataanno.values==ct,0]).flatten(),
y = np.asarray(pca50[dataanno.values==ct,1]).flatten(),
z = np.asarray(pca50[dataanno.values==ct,2]).flatten(),
marker=dict(
size=1,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno)]
iplot(traces)
For demonstration purpose, we first used no_rotation=True which makes sure the output is in the same linear subspace as the PCAs with no rotation, enabling easy comparison.
The cell types are now much better separated in the first three dimensions.
import warnings
warnings.filterwarnings("ignore")
Z = graphdr(pca50,regularization=500,refine_iter=0,no_rotation=True,rescale=True)
traces = [go.Scatter3d(
x = np.asarray(Z[dataanno.values==ct,0]).flatten(),
y = np.asarray(Z[dataanno.values==ct,1]).flatten(),
z = np.asarray(Z[dataanno.values==ct,2]).flatten(),
marker=dict(
size=0.5,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno)]
iplot(traces)
In this case there is a small number of cells floating in between. This is due to the nearest neigbor graph can contain spurious edges connecting unrelated cells.
This usually only happens for a very small number of cells, but you can fix this by performing the optional extra steps of automatically prune spurious graph edges, as shown below:
Z = graphdr(pca50,regularization=500,refine_threshold=12,refine_iter=4,no_rotation=True,rescale=True)
traces = [go.Scatter3d(
x = np.asarray(Z[dataanno.values==ct,0]).flatten(),
y = np.asarray(Z[dataanno.values==ct,1]).flatten(),
z = np.asarray(Z[dataanno.values==ct,2]).flatten(),
marker=dict(
size=0.5,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno)]
iplot(traces)
We can also let GraphDR find the optimal rotation / subspace:
Z = graphdr(pca50,regularization=500,refine_threshold=12,refine_iter=4,no_rotation=False,rescale=True)
traces = [go.Scatter3d(
x = np.asarray(Z[dataanno.values==ct,0]).flatten(),
y = np.asarray(Z[dataanno.values==ct,1]).flatten(),
z = np.asarray(Z[dataanno.values==ct,2]).flatten(),
marker=dict(
size=0.5,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno)]
iplot(traces)
Last note: GraphDR can also use GPU to accelerate the computation even more. Assuming you have a CUDA-enabled GPU with the cupy package installed, just use use_cuda=True and enjoy!
from quasildr.structdr import *
Z = graphdr(pca50,regularization=1000/1.8213,refine_threshold=12,refine_iter=3,no_rotation=True,rescale=False)
Z = Z / Z[:,0].std()
s = Scms(Z[:,:15], bw=0.1, min_radius = 10)
# Here we subsample representative cells to reduce computation time
# This step can be skipped to map all cells to density ridges / trajectories if you have enough computing resource / time.
#For very large data >>10,000 cells. We recommend using the multilevel representation to reduce the data structdr.multilevel_compression.
inds = s.inverse_density_sampling(Z[:,:15], n_samples=2000)
T1, ifilter = s.scms(X=Z[inds,:15],n_jobs = 4, stepsize=1, n_iterations=30, ridge_dimensionality=1, relaxation=0, batch_size=16)
traces = [go.Scatter3d(
x = np.asarray(T1[dataanno.values[inds]==ct,0]).flatten(),
y = np.asarray(T1[dataanno.values[inds]==ct,1]).flatten(),
z = np.asarray(T1[dataanno.values[inds]==ct,2]).flatten(),
marker=dict(
size=1,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno[inds])]
iplot(traces)
#extract structural backbones
from quasildr.utils import *
g_simple, g_mst, ridge_dims = extract_structural_backbone(T1, Z[inds,:15], s, max_angle=90, relaxation=0)
#It may produce NumbaPerformanceWarnings which can be safely ignored.
list_x = []
list_y = []
list_z = []
list_color = []
for i, j in zip(*g_mst.nonzero()):
xs = [T1[i,0], T1[j,0]]
ys = [T1[i,1], T1[j,1]]
zs = [T1[i,2], T1[j,2]]
list_x.extend(xs)
list_y.extend(ys)
list_z.extend(zs)
list_x.append(None)
list_y.append(None)
list_z.append(None)
traces = [go.Scatter3d(
x = np.asarray(T1[dataanno.values[inds]==ct,0]).flatten(),
y = np.asarray(T1[dataanno.values[inds]==ct,1]).flatten(),
z = np.asarray(T1[dataanno.values[inds]==ct,2]).flatten(),
marker=dict(
size=1,
opacity=1,
),
mode='markers',
name=ct
) for ct in np.unique(dataanno[inds])] + [go.Scatter3d(
x=list_x,
y=list_y,
z=list_z,
mode='lines',
line=dict(
color=list_color,
width= 2,
showscale=False,
),
name='MST',)]
iplot(traces)
T, ifilter = s.scms(X=Z[inds,:15],n_jobs = 3, stepsize=1, n_iterations=10, ridge_dimensionality=1, relaxation=0.5, batch_size=16)
ridge_dims = s.relax_bool + 1
traces = [go.Scatter3d(
x = np.asarray(T[:,0]).flatten(),
y = np.asarray(T[:,1]).flatten(),
z = np.asarray(T[:,2]).flatten(),
marker=dict(
size=0.8,
opacity=1,
color=ridge_dims,
),
mode='markers',
) ]
iplot(traces)